Skip to content

fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess issue (PLT-360)#3653

Open
amir-deris wants to merge 4 commits into
mainfrom
amir/plt-360-debug-traceStateAccess-memory
Open

fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess issue (PLT-360)#3653
amir-deris wants to merge 4 commits into
mainfrom
amir/plt-360-debug-traceStateAccess-memory

Conversation

@amir-deris

@amir-deris amir-deris commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Problem

debug_traceStateAccess could add too much memory in a single call. The root cause is in the store StoreTracer: the per-module access log (mt.Accesses) was uncapped. Every Get/Set/Has/Delete/IteratorValue/IteratorNext cloned its key (and value) into that log, so a pathological tx — e.g. a large iterator scan — grew retained memory linearly with the number of keys touched. The existing 16-iterator / 64-key caps only bounded the per-iterator summary, not the raw access log. debug_traceStateAccess makes it worse by replaying the entire block prefix into a single shared tracer.

Fix

Bound the per-module access log while keeping the trace useful:

  • Add maxStoreTraceModuleBytes (4 MiB) plus a fixed perAccessOverheadBytes (64) charge per retained access — the overhead charge bounds high-count / near-zero-payload patterns (e.g. iterator Next() floods), not just large values.
  • Route every access through a single ModuleTrace.record(...): it always updates running op stats, then retains the cloned key/value only while the module stays under the cap. Past the cap the module is flagged Truncated and further accesses are dropped from the log (first-N retention — the earliest reads are the prestate, so this is the correct truncation direction).
  • Accumulate Stats (per-op counts/durations) at record time instead of recomputing them from the log, so they stay exact even when the log is truncated. This matters because the profile report (historicalLookupNanos, trace_profile_report.go) consumes those stats.
  • Surface truncated on ModuleTraceDump so consumers can see the log was bounded.

Net effect: prestate Reads/Has/iterator-key samples are bounded; op stats remain complete.

Tests

  • TestStoreTracerBoundsIteratorScanMemory — drives a 100k-key scan and asserts the tracer retains ≤ 8 MiB (was ~51.5 MB before the fix, i.e. unbounded).
  • TestStoreTracerStatsAccurateWhenTruncated — drives Get past the cap and asserts per-op Count/TotalNanos stay exact (in the internal stats, the per-module dump, and the top-level Stats map) while Truncated is set and Reads is bounded.

Verified go build/go vet clean on sei-cosmos/types, evmrpc/..., sei-db/tools/..., and downstream consumer tests pass (TestKeeperTestSuite/TestViewKeeperStoreTrace, gaskv, giga/deps/xbank).

@cursor

cursor Bot commented Jun 26, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes debug/trace prestate behavior when modules exceed the cap (partial Reads/Has), but stats remain accurate; low blast radius outside trace RPC paths.

Overview
Caps unbounded StoreTracer memory that could OOM debug_traceStateAccess when a tx performs huge iterator scans or many KV touches.

Per-module access logs are now limited to 4 MiB of retained key/value payload (plus 64 bytes overhead per retained event). All store trace events go through ModuleTrace.record, which always updates running op stats but stops appending to Accesses once the cap is hit, setting Truncated. Dump exposes truncated on each module; Reads/Has reflect only the retained prefix while Stats (used by profile reporting such as historicalLookupNanos) stay complete.

Adds tests for bounded retention on large iterator scans and for accurate stats when the log is truncated.

Reviewed by Cursor Bugbot for commit c9db9ee. Bugbot is set up for automated code reviews on this repo. Configure here.

@amir-deris amir-deris changed the title Added cap on module trace access records, added test fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess OOM (PLT-360) Jun 26, 2026
@amir-deris amir-deris changed the title fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess OOM (PLT-360) fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess issue (PLT-360) Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 27, 2026, 12:18 AM

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A focused, well-reasoned fix that caps the per-module access log to stop debug_traceStateAccess from growing memory unboundedly, while accumulating op stats at record time so they stay accurate under truncation. No correctness issues found; the main gaps are a claimed-but-missing test and transient allocation that the cap doesn't bound.

Findings: 0 blocking | 4 non-blocking | 1 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • The PR description claims a TestStoreTracerStatsAccurateWhenTruncated test, but sei-cosmos/types/tracer_test.go contains only TestStoreTracerBoundsIteratorScanMemory. The stats-accuracy-under-truncation behavior is the subtlest part of this change and feeds historicalLookupNanos (evmrpc/trace_profile.go:156-164), which reads the top-level Stats map — please add the missing test so this guarantee is covered (e.g. drive past the cap and assert per-op Count/TotalNanos stay exact while Truncated is set and Reads is bounded).
  • Cursor's second-opinion review (cursor-review.md) was empty — no findings from that pass were available to merge.
  • Edge case worth a one-line note: a single access whose payload alone exceeds maxStoreTraceModuleBytes trips truncation on the very first access (nothing retained, module flagged Truncated). This is acceptable given the bounding goal, but is a slightly surprising consequence of checking accessBytes+cost > cap before retaining anything.
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.

Comment thread sei-cosmos/types/tracer.go
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.20%. Comparing base (4c1b5d2) to head (c9db9ee).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3653      +/-   ##
==========================================
- Coverage   58.94%   58.20%   -0.74%     
==========================================
  Files        2262     2178      -84     
  Lines      187031   177384    -9647     
==========================================
- Hits       110243   103248    -6995     
+ Misses      66847    64994    -1853     
+ Partials     9941     9142     -799     
Flag Coverage Δ
sei-chain-pr 87.67% <100.00%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-cosmos/types/tracer.go 90.35% <100.00%> (+2.37%) ⬆️

... and 113 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b08adc1. Configure here.

Comment thread sei-cosmos/types/tracer.go

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A focused, well-tested fix that caps the StoreTracer per-module access log to bound memory in debug_traceStateAccess while keeping op stats exact under truncation. The core logic (unconditional stats, prefix-retention prestate, per-access overhead charge) is correct; no blocking issues found.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • The cap is per-module with no global ceiling. Worst-case retained payload is ~4 MiB × (number of modules touched). Module count is finite so this is bounded, but since debug_traceStateAccess replays an entire block prefix into a single shared tracer, a tx fanning across many modules can still retain tens of MiB. Consider a global byte budget across modules if the OOM was driven by module fan-out rather than a single huge scan.
  • Cursor's second-opinion review (cursor-review.md) was empty — no output produced.
  • Codex's second-opinion review reported no findings but noted it could not run go test ./sei-cosmos/types because the Go toolchain download was network-blocked, so its pass was not test-validated.

Comments that couldn't be anchored to the diff

  • sei-cosmos/types/tracer.go:213 -- [nit] Minor inconsistency (not a bug): after mt.Truncated is set, RecordIteratorValue still appends to it.Keys (bounded at 64) even though the access log is frozen. Iterator key samples are bounded separately, so this is harmless, but the per-iterator sample can keep growing for already-opened iterators after the module log is marked Truncated — worth a one-line note if the intent was to freeze all sampling at truncation.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A focused, correct fix that caps the per-module StoreTracer access log at 4 MiB (plus a 64-byte per-access overhead charge) while accumulating op stats at record time so they stay exact under truncation. The design, comments, and two new tests are solid; only minor non-blocking observations remain.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Behavioral nuance worth noting: DerivePrestateToJson (evmrpc/tracers.go:567) feeds the debug_traceTransaction/traceStateAccess AppState. When a module is Truncated, the Reads prestate is now a bounded prefix, so a replay relying on that prestate may be incomplete. This is the intended memory-vs-completeness tradeoff and the truncated flag surfaces it, but consumers of the replayed prestate should ideally check the flag — out of scope here, but worth a follow-up.
  • Per-iterator sampled keys (it.Keys, up to 16 iterators × 64 keys) and iterator Start/End clones are not counted against maxStoreTraceModuleBytes. The amount is small and bounded, so the per-module memory claim still holds, but the cap is on the access-log payload only, not strictly all retained per-module bytes.
  • Second-opinion passes: Codex reported no material findings (with a note it could not run tests due to a Go 1.24 vs required 1.25.6 toolchain mismatch); cursor-review.md was empty, so no Cursor findings were available to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant